My IBM Log in Subscribe

How cyber criminals are compromising AI software supply chains

06 September 2024

3 min read

Authors

Charles Owen-Jackson

Freelance Content Marketing Writer

With the adoption of artificial intelligence (AI) soaring across industries and use cases, preventing AI-driven software supply chain attacks has never been more important.

Recent research by SentinelOne exposed a new ransomware actor, dubbed NullBulge, which targets software supply chains by weaponizing code in open-source repositories like Hugging Face and GitHub. The group, claiming to be a hacktivist organization motivated by an anti-AI cause, specifically targets these resources to poison data sets used in AI model training.

No matter whether you use mainstream AI solutions, integrate them into your existing tech stacks via application programming interfaces (APIs) or even develop your own models from open-source foundation models, the entire AI software supply chain is now squarely in the spotlight of cyberattackers.

Poisoning open-source data sets

Open-source components play a critical role in the AI supply chain. Only the largest enterprises have access to the vast amounts of data needed to train a model from scratch, so they have to rely heavily on open-source data sets like LAION 5B or Common Corpus. The sheer size of these data sets also means it’s extremely difficult to maintain data quality and compliance with copyright and privacy laws. By contrast, many mainstream generative AI models like ChatGPT are black boxes in that they use their own curated data sets. This comes with its own set of security challenges.

Verticalized and proprietary models may refine open-source foundation models with additional training using their own data sets. For example, a company developing a next-generation customer service chatbot might use its previous customer communications records to create a model tailored to their specific needs. Such data has long been a target for cyber criminals, but the meteoric rise of generative AI has made it all the more attractive to nefarious actors.

By targeting these data sets, cyber criminals can poison them with misinformation or malicious code and data. Then, once that compromised information enters the AI model training process, we start to see a ripple effect spanning the entire AI software lifecycle. It can take thousands of hours and a vast amount of computing power to train a large language model (LLM). It’s an enormously costly endeavor, both financially and environmentally. However, if the data sets used in the training have been compromised, chances are the whole process has to start from scratch.

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

Other attack vectors on the rise

Most AI software supply chain attacks take place through backdoor tampering methods like those mentioned above. However, that’s certainly not the only way, especially as cyberattacks targeting AI systems become increasingly widespread and sophisticated. Another method is the flood attack, where attackers send huge amounts of non-malicious information through an AI system in an attempt to cover up something else — such as a piece of malicious code.

We’re also seeing a rise in attacks against APIs, especially those lacking robust authentication procedures. APIs are essential for integrating AI into the myriad functions businesses now use it for, and while it’s often assumed that API security is on the solution vendor, in reality, it’s very much a shared responsibility.

Recent examples of AI API attacks include the ZenML compromise or the Nvidia AI Platform vulnerability. While both have been addressed by their respective vendors, more will follow as cyber criminals expand and diversify attacks against software supply chains.

Mixture of Experts | 28 March, episode 48

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Safeguarding your AI projects

None of this should be taken as a warning to stay away from AI. After all, you wouldn’t stop using email because of the risk of phishing scams. What these developments do mean is that AI is now the new frontier in cyber crime, and security must be hard-baked into everything you do when developing, deploying, using and maintaining AI-powered technologies — whether they’re your own or provided by a third-party vendor.

To do that, businesses need complete traceability for all components used in AI development. They also need full explainability and verification for every AI-generated output. You can’t do that without keeping humans in the loop and putting security at the forefront of your strategy. If, however, you view AI solely as a way to save time and cut costs by laying off workers, with little regard for the consequences, then it’s just a matter of time before disaster strikes.

AI-powered security solutions also play a critical role in countering the threats. They’re not a replacement for talented security analysts but a powerful augmentation that helps them do what they do best on a scale that would otherwise be impossible to achieve.